Mining top-k strongly correlated item pairs without minimum correlation threshold
نویسندگان
چکیده
Given a user-specified minimum correlation threshold and a transaction database, the problem of mining strongly correlated item pairs is to find all item pairs with Pearson's correlation coefficients above the threshold. However, setting such a threshold is by no means an easy task. In this paper, we consider a more practical problem: mining top-k strongly correlated item pairs, where k is the desired number of item pairs that have largest correlation values. Based on the FP-tree data structure, we propose an efficient algorithm, called Tkcp, for mining such patterns without minimum correlation threshold. Our experimental results show that Tkcp algorithm outperforms the Taper algorithm, one efficient algorithm for mining correlated item pairs, even with the assumption of an optimally chosen correlation threshold. Thus, we conclude that mining top-k strongly correlated pairs without minimum correlation threshold is more preferable than the original correlation threshold based mining.
منابع مشابه
Top-k Correlation Computation
Recently, there has been considerable interest in efficiently computing strongly correlated pairs in large databases. Most previous studies require the specification of a minimum correlation threshold to perform the computation. However, it may be difficult for users to provide an appropriate threshold in practice, since different data sets typically have different characteristics. To this end,...
متن کاملA FP-Tree Based Approach for Mining All Strongly Correlated Pairs without Candidate Generation
Given a user-specified minimum correlation threshold and a transaction database, the problem of mining all-strong correlated pairs is to find all item pairs with Pearson's correlation coefficients above the threshold . Despite the use of upper bound based pruning technique in the Taper algorithm [1], when the number of items and transactions are very large, candidate pair generation and test is...
متن کاملExtracting Support Based k most Strongly Correlated Item Pairs in Large Transaction Databases
Support confidence framework is misleading in finding statistically meaningful relationships in market basket data. The alternative is to find strongly correlated item pairs from the basket data. However, strongly correlated pairs query suffered from suitable threshold setting problem. To overcome that, top-k pairs finding problem has been introduced. Most of the existing techniques are multi-p...
متن کاملScaling up top-K cosine similarity search
Article history: Received 21 September 2009 Received in revised form 23 August 2010 Accepted 23 August 2010 Available online 8 September 2010 Recent years have witnessed an increased interest in computing cosine similarity in many application domains. Most previous studies require the specification of a minimum similarity threshold to perform the cosine similarity computation. However, it is us...
متن کاملEfficient Ming of Top-K Closed Sequences
Sequence mining is an important data mining task. In order to retrieve interesting sequences from a large database, a minimum support threshold is needed to be specified. Unfortunately, specification of the appropriated support threshold is very difficult for users who are novice to mining queries and task specific data. To avoid this difficulty of specification of the appropriated support thre...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- KES Journal
دوره 10 شماره
صفحات -
تاریخ انتشار 2006